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1. INTRODUCTION 

Vision-based measurement has been one of the most interesting research topics in the last decades. 
Many applications have been developed using vision-based measurement [1]. The two major methods of 3D 
measurement can be categorized into active and passive methods. Structured illumination or laser is used in 
the active measurement. This method is not applicable in many cases. The passive 3D measurement is based 
on stereo vision and provides more advantages than active measurement. It requires simpler instrumentation, 
offering higher applicability in many environments. However, the major issue for passive measurement 
is the difficulty in finding accurate correspondence between stereo images [2]. 

Stereo calibration is the most important step to find a correspondence point. Camera calibration is 
required to ensure that both cameras are in perfect position and to remove distortion. Traditionally, camera 
calibration is performed using the standard chess-board picture [3]. However, much work is required in 
the self-calibration methods. Stereo self-calibration refers to the automatic determination of stereo camera 
parameters from image sequences. 

Self-calibration is an important ability required for the introduction of stereo cameras into 
the market. Many works have been published with this method [4-11]. It can guarantee maintenance-free 
and the long-term operation, as the environmental conditions may change the camera position. Special 
expertise is required to do the offline calibration. Self-calibration may reduce regular offline calibration time 
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and cost. Even if human eyes have different characteristics with minus/plus/cylindrical properties, the human 
brain can automatically adjust. Consequently, the human being will have no difficulties in merging two 
visions from the left and right cameras. In designing a self-calibration method, a matching algorithm is 
an important tool to find a correspondence point between images of two cameras. 

The main objective of this paper is to analyze the performance of three matching algorithms for 
the autocalibration process. Two of the most common techniques for stereo correspondence are the sum of 
absolute differences (SAD) and the sum of squared differences (SSD). The corresponding points between 
images have been obtained by minimizing SAD or SSD in area-based block matching [12]. However, these 
two techniques result in low accuracy as their major drawback. An improvement by using sub-pixel block 
matching techniques has been explored in [4], but the obtained accuracy was still not enough. Recently, there 
have been many algorithms proposed on image matching using various techniques [13]. In this work, a set of 
experiments demonstrates that the stereo vision system employing the proposed technique can measure 3D 
surfaces of free-form objects with sub-mm accuracy. Three matching techniques used in this research are 
SIFT, SURF, and ORB. The matching algorithm provides the characteristics of each camera [14]. It used to 
transform the second image to perform automatic stereo calibration. The explanation of each algorithm is 
explained as follows. 

- SIFT 

Scale invariant feature transform (SIFT) is a matching algorithm proposed by Lowe [15]. 
This algorithm works very well in finding a correspondence point of the image which is rotated and 
transformed. This algorithm consists of four steps. The first step is the estimation of scale-space extrema 
using the Difference of Gaussian method, being express using (1) and described in Figure 1. 


D(z,y,0) = (G(z,y,ko) — G(az,y,c)) * I(x, y) 


= L(z,y,ko) — L(z,y,¢). (1) 
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Figure 1. The estimation of scale-space extrema using Difference of Gaussian method 


In the next step, the key point candidates are refined by the elimination of low value. Laplacian of 
Gaussian 62V2G is used since it produces the most stable image feature than others. The correlation between 
the Difference of Gaussian and the Laplacian of gaussian can be expressed using (2) and (3). 


oW2C = = z G(x, y, ko) — G(r, Y, 0) 


da ko —o (2) 
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The key point orientation is assigned by using an image gradient. The final step is the computation 
of the local image descriptor based on the gradient and orientation of the key point. Because of its algorithm 
complexity, SIFT requires a large computational capacity, even though it is very suitable for object 
recognition applications [16, 17]. 

- SURF 

Speed up robust feature (SURF) technique performs faster than SIFT [18]. In some cases, 
it performs with equal quality to SIFT. SURF technique is based on a descriptor and a detector, which is 
equal to SIFT. Instead of using the gaussian average of the images, SURF uses squares for approximation. 
It employs the Hessian matrix-based Blob detector to find the point of interest. Wavelet response is used for 
orientation assignment by applying gaussian weight. SURF feature descriptor is generated by the wavelet 
response of the subregion. The subregion is the division of the neighbor around the key point. Two points 
will form a correspondence (match) if they the same contrast, generated from Laplacian. 

- ORB 

Oriented FAST and rotated BRIEF (ORB) has been proposed by Rublee, et al. [19]. It 1s another 
alternative for SIFT. ORB is a combination of the FAST key point and the BRIEF descriptor. The FAST is 
used to determine the key point [20]. In the next step, Harris corner is used to find the top N point. FAST 
computes the intensity-weighted centroid, located at the center. The orientation is obtained by the vector 
direction to the centroid. 


2. RESEARCH METHOD 

The purpose of this research is to find the best algorithm for the auto-calibration of stereo vision. 
The first step of calibration is the finding of the corresponding points between two images. The accuracy of 
this step determines the accuracy of stereo vision. The object of this research is a microscopic object with 
the size of a few millimeters. The disparity of the points is converted into the intrinsic parameter 
of the camera. 

The method used in this research is described in Figure 2. The stereo image has been produced using 
two cameras. In order to handle the very narrow view area caused by the small-size objects, the converged 
camera setup is used. It is hard to put objects in the overlapped area if parallel cameras are used. 
The histogram equalization steps are required since the illumination of each image or camera color character 
possibly different [21]. To reduce the noises, the combination of Gaussian and medium filter applied. 
Both filters are proposed to improve the image quality [22]. Gaussian can be expressed in (4). 
While the median filter expressed in (5). A combination of both filter expressed using (6). 





i _xt+y? 

G(x, y) — ae: e 202 
(4) 
M(x,y) =med{f(x -—iy—j),ij EW} (5) 
f(x,y) = Gi, y) + M(x,y) (6) 


The result of histogram equalization is processed using a feature extraction algorithm. Three feature 
extraction algorithms are used to find the match correspondence point on each set of images [20]. 
The match correspondence point used for the rectification process [23]. Distance between each corresponding 
point is used to extract the stereo parameter. The output of this method is the stereo calibration 
parameter [24]. The result of this process can be transformed into a 3D surface. 


stereo 3D 
Rectification Extraction, 3D Surface 


Features 
Extraction 





& parameter j Disparity = 
extraction extraction 





Cò Matching 





Figure 2. The distance measurement procedure 
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Two industrial standard HD camera is used in this research. These cameras are equipped with 
a 100x lens to enlarge the object size. Two captured images from both cameras are then compared and 
evaluated using the matching algorithm to find the corresponding point. Figure 3 shows the camera setup and 
the object size. 





Figure 3. The cameras set-up and the object size 


A millimeter template is used to measure the size of the object and as a reference of 
the auto-calibration. The dimension of the object is shown in Figure 4 (a), whereas Figure 4 (b) represents 
five pairs of image sets which are generated using the system for testing purpose. Each set is compared using 
three matching algorithms: SIFT, SURF, and ORB. 
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Figure 4. (a) The dimension of the object and (b) the datasets used in the research 


3. RESULTS AND ANALYSIS 

The execution of SIFT, SURF, and ORB on each pair of image sets has been performed to find 
the best method for image matching. In the obtained results, the green line indicates the correspondence point 
between the left and right images. The number of connected lines shows the number of matched points. How 
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ever, each algorithm still resulted in an error if the algorithm failed to match the correct points. The result of 
this matching is used to generate the calibration parameter of stereo vision. 


3.1. Matching results using SIFT, SURF, and ORB 

The result of implementing the SIFT, SURF and ORB algorithms on the captured object are given 
respectively in Figures 5 (a-c). As seen in the image set 1 and 5 of Figure 5 (a), only a few lines have been 
generated by the SIFT algorithm. The background has very high similarities between images. The result of 
SURF algorithm implementation given in Figure 5 (b) indicates that on the image set 1 there have been only 
a few lines generated by an algorithm and some lines indicated a major error. The rest of the image sets 
shows the correct corresponding points. The result of implementing the ORB algorithm shown in Figure 5 (c) 
also indicates that there have been only a few lines generated by the algorithm on the image set 1, with some 
lines indicated major error. The four other image sets indicated the correct corresponding points. 

The comparison of the matching results using SIFT, SURF, and ORB techniques is presented in 
Table 1. It indicates matching accuracy of the three algorithms SIFT, SURF, and ORB. It can be known from 
the table that the SIFT algorithm gives the highest average percentage accuracy. However, the percentage of 
correct lines varies depending on the image characteristics. For the image with high similarities, SURF failed 
to give a good result, whereas ORB could generate many lines, but with high error rates. 





(a) (b) 
Figure 5. Experiment results of matching algorithm using: (a) SIFT, (b) SURF, and (c) ORB 


As seen in Figure 5 (a), the image set 1 and set 5 only have a few lines have been generated by 
the SIFT algorithm. The background has very high similarities between images. In contrast, the results of 
the SURF algorithm (Figure 5 (b)), which applied to the image set 1, only a few lines generated by 
the algorithm. The rest of the image sets shows the correct corresponding points. The result of implementing 
the ORB algorithm shown in Figure 5 (c) also indicates that there have been only a few lines generated by 
the algorithm on the image set 1, with some lines indicated major error. Parallel lines, which group together 
to form a thicker image symbolize accuracy. In contrast, line out of parallel, crisscrossing each other creating 
a dispersed image signify inaccuracy. 

The comparison of the matching results using SIFT, SURF, and ORB techniques is presented in 
Table 1. It indicates matching accuracy of the three algorithms SIFT, SURF, and ORB. As can be seen in 
the table, the SIFT algorithm gives the highest average accuracy percentage. However, the percentage of 
correct lines varies depending on the image characteristics. For the image with high similarities, SURF failed 
to give a good result, whereas ORB could generate many lines, but with high error rates. 

The result in Table 1 compared with the result from Karami et.all [13] with the case of varying 
intensity shown in Table 2. It shows that in both works, SIFT performs better than other methods. Table 3 
shows the comparison of the computational time of each algorithm. It shows that the SIFT method required 
a longer time than the others due to the complex algorithm computation. SIFT required a longer time when 
the image had high similarities in its texture. Figure 6 indicates that the ORB algorithm has the fastest 
computation time for all images sets. It takes less than 0.5s processing time. However, the ORB algorithm 
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gives less matching rates compared with other methods. The line chart in Figure 6 also indicates that 
the complexity of the images linear with the computation time. Image set 1 and 5 give the longest 
computation time than other images set because of their complexity. 


Table 1. Comparison of the matching results using the SIFT, SURF, and ORB techniques 


No SIFT SURF ORB 
` Lines Correct Point % Correct Lines Correct Point % Correct Lines Correct Point % Correct 

1 15 14 93.33% 14 2 14.29% 78 24 30.77% 
2 150 120 80.00% 443 430 97.07% 89 70 78.65% 
3 400 356 89.00% 278 256 92.09% 254 224 88.19% 
4 345 321 93.04% 600 467 77.83% 345 156 45.22% 
5 20 16 80.00% 125 112 89.60% 375 153 40.80% 

Average 87.08% Average 74.17% Average 56.73% 


Table 2. Comparison of the matching results between Karami and this work 


Match Rate (“%) 
pene Karami This Work 
SIFT 76.7 87.08 
SURF 72.6 74.17 
ORB 63.6 56.73 


Table 3. Computational time using the SIFT, SURF, and ORB techniques 


Computational Time 
Image Set 











SIFT SURF ORB 
l 2.114 0.926 0.052 
2 1.149 0.777 0.039 
3 0.788 0.6 0.033 
4 0.858 0.576 0.033 
5 1.36 1.149 0.075 
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Figure 6. Computational time comparison chart 


3.2. Image rectification 

The matching point from previous steps is used for rectifying the images. The difference position 
between source and destination point used as a reference for transformation. Figures 7a and 7b show 
a distorted image from left and right camera. Figure 7a used as the reference, while the Figure 7b is the object 
of transformation. The result of the image transformation of Figure 7b displayed in Figure 7c. 
This transformation based on the homography equation to reduce distortion [25]. 
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Figure 7. Rectification result (a) left Image as reference (b) right image (c) the result of rectification 


3.3. 3D surface generation 


The matching process results in the distance between points. Using the distance values, a 3D surface 
object can be generated by projecting them onto the z-axis [26, 27]. Distance value between both images 
assigned as the depth value. If the distance is small, the object is closer to the camera, and vice versa. Depth 
value for each pixel than converted to grayscale to distinguish the depth of point. Figure 8 shows 
the generated disparity map of the dataset using SIFT Adjustment. Correlated point produces by SIFT is used 
to calculate the stereo camera parameters. The result shows that the algorithm successfully generates match 
stereo, however, the noisy output is a bit challenging. Using the depth value as z-axis produce 3d view as 


shown in Figure 9 the algorithm successfully produces 3D reconstruction, but the noises reduce 
image quality. 





Figure 8. Generated depth value based on SIFT matching algorithm 
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Figure 9. 3D surface reconstruction 


4. CONCLUSION 


In this paper, three different image matching techniques, SIFT, SURF, and ORB, for stereo 
autocalibration system have been compared. SIFT indicates the best performance in most scenarios under 
consideration. In the special case, when the images contain multiple high similarities texture, SURF failed to 
give good results. In the ORB implementation, the features are mostly concentrated in objects at the center of 
the image. While SIFT and SURF, the features are distributed over the image. The 3D reconstruction image 
has successfully generated, but the noise reduces the quality of the images. For future work, a good filtering 
algorithm required for a better result, without scarifying the details of images. 
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